目標:使用 mask() & query() 函數來計算
1.計算年紀大於70歲(age>70)的存活率(survived=1)
2.計算年紀小於15歲(age<15)的存活率(survived=1)
先匯入資料
import seaborn as sns
import numpy as np
import pandas as pd
df = sns.load_dataset('titanic')
df.head(10)
執行結果:
survived pclass sex age sibsp parch fare embarked class who adult_male embark_town alive alone
1 1 1 female 38.0 1 0 71.2833 C First woman False Cherbourg yes False
2 1 3 female 26.0 0 0 7.9250 S Third woman False Southampton yes True
3 1 1 female 35.0 1 0 53.1000 S First woman False Southampton yes False
8 1 3 female 27.0 0 2 11.1333 S Third woman False Southampton yes False
9 1 2 female 14.0 1 0 30.0708 C Second child False Cherbourg yes False
10 1 3 female 4.0 1 1 16.7000 S Third child False Southampton yes False
11 1 1 female 58.0 0 0 26.5500 S First woman False Southampton yes True
14 0 3 female 14.0 0 0 7.8542 S Third child False Southampton no True
15 1 2 female 55.0 0 0 16.0000 S Second woman False Southampton yes True
18 0 3 female 31.0 1 0 18.0000 S Third woman False Southampton no False
方法一:使用 mask() 函數來計算
用masking功能計算年紀大於70歲的存活率
survived_rate_70 = (df[df['age'] > 70]['survived'] == 1).mean()
print(f'年紀大於70歲的存活率:{survived_rate_70}')
執行結果:
年紀大於70歲的存活率:0.027027027027027027
計算年紀小於15歲的存活率
survived_rate_15 = (df[df['age'] < 15]['survived'] == 1).mean()
print(f'年紀小於15歲的存活率:{survived_rate_15}')
執行結果:
年紀小於15歲的存活率:0.5769230769230769
方法二:使用 query() 函數來計算
df = df.query('age > 70')
survived_rate = df['survived'].mean()
print(survived_rate)
執行結果:
0.027027027027027027
泰坦尼克號上年紀大於70歲的乘客的生還率為 2.70%。
df = df.query('age < 15')
survived_rate = df['survived'].mean()
print(survived_rate)
執行結果:
0.5769230769230769
泰坦尼克號上年紀小於15歲的乘客的生還率為 57.69%。
使用 mask() 函數和 query() 函數計算的結果相同。